On the Long-Tail Entities in News

نویسندگان

  • José Esquivel
  • M-Dyaa Albakour
  • Miguel Martinez-Alvarez
  • David Corney
  • Samir Moussa
چکیده

Long-tail entities represent unique challenges for state-ofthe-art entity linking systems since they are under-represented in general knowledge bases. This paper studies long-tail entities in news corpora. We conduct experiments on a large news collection of one million articles, where we devise an approach for measuring the volume of such entities in news and we uncover insights on the challenges associated with linking these entities to general knowledge bases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

News-Based Group Modeling and Forecasting

In this paper, we study news group modeling and forecasting methods using quantitative data generated by our largescale natural language processing (NLP) text analysis system. A news group is a set of news entities, like top U.S. cities, governors, senators, golfers, or movie actors. Our fame distribution analysis of news groups shows that lognormal and power-law distributions generally could d...

متن کامل

Catching the Long-Tail: Extracting Local News Events from Twitter

Twitter, used in 200 countries with over 250 million tweets a day, is a rich source of local news from around the world. Many events of local importance are first reported on Twitter, including many that never reach news channels. Further, there are often only a few tweets reporting each such event, in contrast with the larger volumes that follow events of wider significance. Even though such e...

متن کامل

Content-Based Collaborative Filtering for News Topic Recommendation

News recommendation has become a big attraction with which major Web search portals retain their users. Two effective approaches are Content-based Filtering and Collaborative Filtering, each serving a specific recommendation scenario. The Content-based Filtering approaches inspect rich contexts of the recommended items, while the Collaborative Filtering approaches predict the interests of long-...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

گفتن خبر بد به بیمار و جوانب مختلف آن

Breaking bad news to the patients does not back to a long history and is a controversial issue between patients and physicians. Many physicians are reluctant to breaking bad news to patients and this is not desirable for most patients. For example, in Northern European countries and United States, most physicians usually break bad news to the patients, while in Southern and Eastern European cou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017